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ABSTRACT 

Using Microsoft® Excel, several interactive, computerized learning modules are developed to 
illustrate the Central Limit Theorem’s appropriateness for comparing the difference between the 
means of any two populations. These modules are used in the classroom to enhance the 
comprehension of this theorem as well as the concepts that provide the foundation for inferences 
involving the comparison of two population means. 
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INTRODUCTION 



here are many instances where the comparison of two population means is desirable. One approach 
to these types of inferences is to select independent, random samples from each population and 
compute the sample mean for each sample. The difference between the two sample means 
(Xi — X2 ) is then used as a point estimator for the difference between the two population means (pi - p 2 ). Different 
samples result in various values for the two sample means, and it is the sampling distribution of (xi — xi) that 
describes the characteristics of this point estimator. If both sample sizes are sufficiently large, the Central Limit 
Theorem leads to the conclusion that the sampling distribution of (Xi — Xi) can be approximated by a normal 
probability distribution (a symmetrical bell-shaped distribution). Additional characteristics of the sampling 

' 2 2 

distribution are that the mean is (pi - p 2 ), and the standard deviation is (Anderson, 2008). These results 

V ni ni 


are not intuitively obvious, and despite textbook illustrations and in-class discussions, the rationale for using the 
normal probability distribution often remains unclear. However, through the use of Microsoft® Excel simulations, 
it is possible for students to gain a clearer understanding and appreciation of both the Central Limit Theorem and the 
concepts that provide the foundation for inferences involving the comparison of two population means. 


METHODOLOGY 


The Central Limit Theorem states that when a random sample of n observations is selected from a 
population (any population) with a mean of p and a standard deviation of a, then when n is large, the sampling 
distribution of the mean is approximately a normal distribution with a mean of p and a standard deviation of a/Vn 
(standard error of the mean) (McClave, 2005). This theorem can also be generalized, and in doing so it states that 
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under rather general conditions, sums, differences, and means of random measurements drawn from any population 
tend to possess, approximately, a bell-shaped distribution in repeated sampling. 

Consider the sampling distribution for the difference between two sample means. In the following 
discussion, several interactive Microsoft® Excel modules are created that illustrate the Central Limit Theorem and 
inferences for the difference between two population means. Sampling is done from two different populations. 
Specifically, Excel simulations are created using two different population distribution families: uniform and 
exponential. In each case, the parameters associated with a population distribution can be modified to allow for the 
simulation of a wide variety of populations within each family. The simulation techniques used below follow the 
procedures found in Moen and Powell, 2005. The actual Excel formulas are also found in that paper. These 
techniques provide the ability to simulate the selection of repeated random samples from uniform and exponential 
population distributions. The simulated sampling distribution can then be represented with a frequency distribution 
and histogram. The results also include calculations for the mean and standard deviation of the estimated sampling 
distribution. The creation of the frequency distribution, histogram, and descriptive statistics are actually dynamic. 
That is, each time function key F9 (Calculate) is depressed in Excel, new samples are simulated, the differences 
between the sample means are recalculated, and the accompanying frequency distribution, histogram, and 
descriptive statistics are recomputed. All of the illustrations below all based on the selection of 500 random samples 
each of size ni = n 2 = 30. 

RESULTS WHEN BOTH POPULATIONS ARE UNIFORMLY DISTRIBUTED 

Consider the continuous uniform probability distribution with parameters a and b, where a < b. The 
probability density function for a random variable v is given by 


fix) = 1 /(b-a), for a<x<b 
= 0 elsewhere 


where E(x) = p = (a + b )/2 and Var(x) = a 2 = (b 


-a) 2 / 12. 


(Anderson, 2008) 



The Excel simulation module created for this population distribution allows the user to select values for 
parameters a and b. Consider the following case associated with estimating the difference between the population 
means when both populations are continuous uniform probability distributions. For illustration purposes, suppose 
the first population distribution has parameters a = 20 and b = 80, and the second population distribution has 
parameters a - 10 and b = 50. Then, E(x) = p, = (20 + 80)/2 = 50, Var(x) = gi 2 = (80 - 20) 2 /12 = 300, and the 
standard deviation Gi = VVar(x) = 17.321 for the first distribution, while E(x) = p 2 = (10 + 50)/2 = 30, Var(x) = o 2 2 = 
(50 - 10) 2 /12 = 133.33, and the standard deviation g 2 = VVar(x) = 11.547 for the second distribution. If independent 
random samples of size ni = n 2 = 30 are selected from these two populations, it follows that the mean of the 


sampling distribution for (X1 — X2) is (pi - p 2 ) 


(50 


30) = 20 and the standard deviation is 



|300 + 0333 = 3 8 oo6, 
V 30 30 


Figure 1 provides the histogram and descriptive statistics for one iteration of this simulation example. Note 
that when samples of size 30 have been selected from two continuous uniform probability distributions, the 
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simulated sampling distribution’s shape is approximately normal and the mean and standard deviation are close to 


(m - H 2 ) and <ZL 
V ni 


2 

O '2 respectively. 
m 


Figure 1 



Population Mean = 20. 

Population Std. Dev. = 3.8006 

Simulated Sampling Distribution Mean = 20.0062 
Simulated Sampling Distribution Std. Dev. = 3.8988 

RESULTS WHEN BOTH POPULATIONS ARE EXPONENTIALLY DISTRIBUTED 

The exponential probability distribution is often used to describe the time between arrivals (IAT) at a 
service facility or the service time required at a facility. 

Consider the continuous exponential probability distribution with parameter p, where p represents time. 
The probability density function for a random variable v is given by 


f(x) = jLie **, for x > 0, ju > 0 
= 0 elsewhere 


, where E(x) = 1/p and Var(x) = a 2 = 1/p 2 . (Naylor, 1968) 
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The Excel simulation module created with this population distribution allows the user to select values for 
the parameter p. Consider the case associated with estimating the difference between the population means with 
both populations are exponential probability distributions. For illustration purposes, suppose the first population 
distribution has pi = 1/3 as a parameter. Then, E(x) = 1/pi = 3.0, Var(x) = Gi 2 = 1/pi 2 = 9.0 and the standard 
deviation Gi = VVar(x) = 3.0. Let p 2 = 1/2 for the second population distribution. It follows that E(x) = l/p 2 = 2.0, 
Var(x) = g 2 2 = l/p 2 2 = 4.0 the standard deviation g 2 = VVar(x) = 2.0. If independent random samples of size ni = n 2 
= 30 are selected from these two populations, it follows that the mean of the sampling distribution for (xi — X 2 ) is 

(pi - p 2 ) = (3 - 2) = 1 and the standard deviation is j <Zl+ = / —+ — = 0.6583. 

V nx m V 30 30 

Figure 2 provides the histogram and descriptive statistics for one iteration of this simulation example. Just 
as with the two continuous uniform probability distribution example, when samples of size 30 have been selected 
from two exponential probability distributions, the simulated sampling distribution’s shape is approximately normal 

I 2 2 

and the mean and standard deviation are close to (gi - p 2 ) and <Zl+ respectively. 

V nx ni 


Figure 2 


Simulated Sampling Distribution for the 
Difference Between Two Means 



Numerical Descriptive Measures 

Population Mean = 1.0000 
Population Std. Dev. = 0.6583 

Simulated Sampling Distribution Mean = 1.0072 
Simulated Sampling Distribution Std. Dev. = 0.6614 

RESULTS WHEN THE FIRST POPULATION IS UNIFORMLY DISTRIBUTED AND THE SECOND 
POPULATION IS EXPONENTIALLY DISTRIBUTED 

In the previous two examples, the population distributions have both been selected from the same family of 
distributions. This does not need to be the case, however, because the Central Limit Theorem applies to random 


68 



























American Journal of Business Education - Fourth Quarter 2008 


Volume 1 , Number 2 


samples selected from any population with a mean of p and a standard deviation of a. Thus, consider the case where 
the first population has a continuous uniform probability distribution, while the second population is exponentially 
distributed. As before, the parameters associated both population distributions can be modified to allow for the 
simulation of a wide variety of populations within each family. For illustration purposes, one population 
distribution from each of the two earlier examples will be used. That is, suppose the first population distribution is 
uniformly distributed with parameters a - 20 and b - 80, and the second population distribution is exponentially 
distributed with parameter p = 1/3. Then, E(x) = pi = (20 + 80)/2 = 50, Var(x) = Gi 2 = (80 - 20) 2 /12 = 300, and the 
standard deviation Gi = VVar(x) = 17.321 for the first distribution, and E(x) = l/p 2 = 3.0, Var(x) = o 2 2 = l/p 2 2 = 9.0 
and the standard deviation g 2 = VVar(x) = 3.0 for the second distribution. If independent random samples of size ni 
= n 2 = 30 are selected from these two populations, it follows that the mean of the sampling distribution for 

(Xi — X 2 ) is (pi - p 2 ) = (50 - 3) = 47 and the standard deviation is <Zl + <Zl = + — = 3.2094. 

V ni ri 2 V 30 30 

Figure 3 provides the histogram and descriptive statistics for this simulation example. Note that even 
though the two population distributions were selected from different probability distribution families, the simulated 
sampling distribution’s shape is still approximately normal and once again the mean and standard deviation are close 

I 2 2 

to (pi - p 2 ) and \g\ + g\ respectively. 

V Th fli 


Figure 3 


Simulated Sampling Distribution for the 
Difference Between Two Means 



Numerical Descriptive Measures 

Population Mean = 47. 

Population Std. Dev. = 3.2094 

Simulated Sampling Distribution Mean = 47.0412 
Simulated Sampling Distribution Std. Dev. = 3.2249 
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The objective of this paper has been to develop a better understanding of the Central Limit Theorem’s 
appropriateness when comparing the difference between the means of any two populations. Microsoft® Excel 
provides the opportunity to create simulations that demonstrate this non-intuitive theorem. It can be clearly 
observed that the simulated sampling distributions for the difference between two means follow a normal probability 
distribution fairly closely for samples of size 30. This approximation is not as good as the sample sizes drop farther 
and farther below 30; however, the approximation is even better for samples larger than 30 in size. The simulations 
also illustrate that the mean and standard deviation for the sampling distribution of (xi — xi) are (pi - p 2 X and 
I 2 2 

Ci | C 2 respectively. Only the continuous uniform probability distribution and the exponential probability 

v m m 

distribution were considered as population distributions in this paper. However, these same results can be illustrated 
with the use of any two population distributions. By demonstrating these simulations in a statistics class, students 
will gain a clearer understanding and a better appreciation of the usefulness of the Central Limit Theorem in 
statistical analyses. 
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